perm filename DOC[AP,SYS] blob sn#010733 filedate 1972-08-03 generic text, type T, neo UTF8
COMMENT ⊗   VALID 00010 PAGES 
RECORD PAGE   DESCRIPTION
 00001 00001
 00002 00002	USING THE ASSOCIATED PRESS EXTRACTER:
 00006 00003	
 00009 00004	USING THE ASSOCIATED PRESS HOT LINE:
 00011 00005	HOW THE ASSOCIATED PRESS EXTRACTER WORKS:
 00013 00006	DESCRIPTION OF THE FILES USED:
 00017 00007	FILE USAGE FOR THE AP NEWS FILING, CATALOGING AND RETRIEVING SYSTEM.
 00019 00008	FILER:
 00023 00009	DOER:
 00026 00010	
 00027 ENDMK
⊗;
USING THE ASSOCIATED PRESS EXTRACTER:


YOU TYPE:			IT ANSWERS:

_____________________________________________________________________

R APE				a. WAITING FOR GODOT...

				b. KEYWORDS:
			___________________
	In this initial step, you have indicated a desire to read
the news from the AP (Associated Press) news line. 
a.  The program has to wait for another program to let go of a file.
    If you wait, it should come back with KEYWORDS:.
b.  The program has responded with a request for keywords. 
    It knows roughly 700 keywords (at present) and most of them are
    proper nouns. Your answer to the question KEYWORDS: should
    be to type in words that you are interested in reading about,
    followed by an altmode (represented below by $).
		[expand this section]
_____________________________________________________________________

				KEYWORDS:

(VIETNAM+KOREA)*FOO$		a. *** UNRECOGNIZED KEYWORD: FOO ***

				b. *** MISSING RIGHT PARENTHESIS ***

				c. *** MISSING KEYWORD ***

				d. *** SYNTAX ERROR ***

				e. NO NEWS ITEMS FOUND

				f. 004 NEWS ITEMS FOUND
				   Read them now?
			__________________

a. This means that FOO is not a keyword.  If you would like specific words
   added to the list of keywords, SEND a note to ME or EMC.
b. This is obvious.
c. This means the parser was expecting a keyword but found something
   else (like an operator or a right parenthesis).
d. The parser thought it was finished, but your input string still had
   characters in it (not counting spaces, tabs, and CRLFs).
e. There has been no news about your keywords during the last
   approximately 24 hours.
f. Success.
			__________________

 B.(alt)			004 NEWS ITEMS FOUND
				Read them now?

This feature of not typing any keywords allows you to review the stories for the
last keyword you have typed.
_______________________________________________________________________

______________________________________________________________________

				004 NEWS ITEMS FOUND
				Read them now?

3a.Y 		 		(The four stories will be output to your console.
				They are in the order of the most recent first,
				and separated by a row of stars (*****). Also,
				if there are corrections or additions, or if the 
				story has been broken into two smaller stories
				(takes), all will be output together as one story.)
				KEYWORDS:

 b.N				KEYWORDS:

 C.(anything else)		Direct the news where?(Tty,Spooler, and/or File)
______________________________________________________________________

				Direct the news where?(Tty,Spooler, and/or File)

4.(any combination of:)

 a. T (or TTY)			(Just like typing "Y" to "Read them now?".)

 b. S (or SPOOLER)		(if only S) @@@@
					    KEYWORDS:
				(if ST:    Just like typing "T".)
				(if SF:    Just like typing "F".)

 c. F (or FILE) 		Type filename (the extension .AP will be used):
			_____________________

b. If only S is your reply, one "@" will appear for each story found as they
   are read and filed. They are filed in a temporary file $NEWS0.AP which is
   deleted after it is spooled. If $NEWS0.AP exists, then $NEWS1.AP is tried, etc.
_________________________________________________________________________

				Type filename (the extension .AP will be used):

5. FOO				a. FILE ALREADY EXISTS!
				   Type filename (the extension .AP will be used):

				b. @@@@
				   KEYWORDS:
__________________________________________________________________________
USING THE ASSOCIATED PRESS HOT LINE:
YOU TYPE:			IT ANSWERS:

RU HOT				a. WELCOME TO THE AP HOT LINE...

				b. (Cannot contact FILER (hot line supervisor program).
			_____________________

Since HOT tries to contact the supervisor program, there may be a
pause before any response is obtained from the computer.
a. You have just caused an interrupt in FILER and will now begin to receive
	the news as it comes over the line. The news should
	come a buffer at a time, with a pause in between each buffer.
	However, there is the chance that no news at all is coming over the line
	in which case you could sit there with no news whatsoever.
b.  Something is wrong with the program that reads the news from the AP line. 
	Sorry, you loose.
HOW THE ASSOCIATED PRESS EXTRACTER WORKS:

	Six programs are needed to categorize the news. They are,
with brief descriptions:

FILER: The program that reads the stories from the line, converts them to
	ascii, and eventually files them into the NEWS file. It also
	updates a pointer in the INDEX file to point to the new story it has
	just filed. FILER sends output to the HOT line, and starts DOER and 
	INITER ptys.
DOER: The program that is started when FILER finishes with a story. It reads
	in the new news story and, after alphabetizing the words, searches 
	for the words in the DICT (dictionary of keywords) and fixes the 
	appropriate links in the LINKS file. 
APE: The user program. It uses the links prepared by DOER in retrieving stories
	by keyword.
INITER:	The initializing program. It is run on the first time the AP News system
	comes up, and initializes WORDS, DICT, AND LINKS. It also takes input from 
	the list of sorted keywords WORDS.SRT and puts them in the form of
	DICT. If the entries of the dictionary are to be changed then this
	program must be run.
HOT: Another user program. This program simply sets a bit in FILER
	corresponding to the User job number.
SORT: Sorts the keywords in WORDS.TXT and puts them in WORDS.SRT.
DESCRIPTION OF THE FILES USED:

DICT		LINKS		INDEX		NEWS
__________     __________     __________     _________________
|   |    |    |    |     |   |   |   |  |    |                |
|   |    |    |    |     |   |   |   |  |    |                |
| 1 | 2  |    | 3  |  4  |   | 5 | 6 | 7|    |                |
|   |    |    |    | 0   |   |   |   |  |    |                |
|   |    |    |    | ↑   |   |   |   |  |    |                |
|   |   ↓|←←←←|←←←↑| ↑   |   |   |   |  |    |                |
|   |   X→→→→→|X  X| X  X|→→→|  X|   |  |    |                |
|   |    |    |↓  ↑|     |   |  ↓|   |  |    |                |
|X__|x___|    |0__X|_____|   |__0|___|__|    |________________|
 ↓   ↓→→→→→→→→→→→→→→→→
_↓______________    _↓______________
|		|  |		   |
|    WORDS	|  |    MULTS	   |
|		|  |               |
|_______________|  |_______________|


DICT: Dictionary of keywords.
	1. Left half:	holds pointer into WORDS, which holds the actual keywords
	   Right half:	vacant (possible site for Twin keyword link).
	2. Left half:	holds pointer to MULTS (multiple word keywords, ex: UNITED STATES).
	   Right half:	holds pointer to first occurrencce of this word in a story
LINKS: Holds pointers to all words in the same story and all stories with the same word
	3. Left half:	holds pointer to the same word in a different story.
	   Right half:	holds back pointer to same word in a different story.
	4. Left half:	holds the pointer to a different word in the same story.
	   Right half:	holds the pointer to the index for this story.
INDEX: Points to the actual stories in NEWS.
	5. Left half:	backpointer to the first word in this story.
	   Right half:	holds the pointer to adds, corrections, and multiple
			takes for this story.
	6. Left half:	holds the record number where this story can be found
			in NEWS
   	   Right half:	holds the displacement of the story from the beginning of
			the record.
	7. Holds the number of the story as it came over the wire. Used in looking
	   back to link together multiple takes and adds and corrections.
NEWS: Holds the actual news stories in ascii.
WORDS: Holds the keywords in ascii. Keywords are limited to 20 characters.
MULTS:	Not implemented yet, but will hold the second word in multiple word
	keys, much like WORDS.
WORDS.TXT: Input file of keywords to be sorted
WORDS.SRT: File of keywords after they are sorted.
;FILE USAGE FOR THE AP NEWS FILING, CATALOGING AND RETRIEVING SYSTEM.


------------------------------------------------------------------------------
SOURCE FILES	DESCRIPTIONS

FILER		PROGRAM TO READ FROM AP LINE AND FILE STORIES.
DOER		PROGRAM TO CATALOG STORIES.
APE		PROGRAM TO RETRIEVE STORIES CONTAINING GIVEN KEYWORDS.
INITER.SAI	PROGRAM TO INITIALIZE THE FOLLOWING FILES: WORDS, DICT, LINKS.
SORT		PROGRAM TO SORT THE FILE WORDS.TXT INTO THE FILE WORDS.SRT.
HOT		PROGRAM TO OUTPUT AP NEWS AS IT COMES IN OVER THE AP LINE.


------------------------------------------------------------------------------
DATA FILES	DESCRIPTIONS

NEWS		AP NEWS STORIES.
INDEX		INDEX INFORMATION INTO NEWS FILE.
LINKS		LINKS CONNECTING ALL WORDS IN SAME STORY AND ALL STORIES WITH SAME WORD.
DICT		POINTERS TO WORDS FILE AND LINKS FILE INDICATING CATALOGING OF STORIES.
RELATS		POINTERS TO: TWIN, SON, BROTHER.
MULTS		MULTIPLE WORD KEYWORDS.
WORDS		CHARACTERS IN THE KEYWORDS IN THE DICTIONARY.
WORDS.TXT	TEXT FILE USED TO TYPE IN DICTIONARY FOR INITIALIZATION.
WORDS.SRT	SORTED VERSION OF WORDS.TXT
FILER:

1. INITIALIZATION: Most of this part of the program is done only
the first time the NEWS program comes up. It assures
the program of having the necessary files to work. If no NEWS file
exists, NEWs and INDEX files are created. INITER is also run from
this part of the program. If DOER  isn't running, it is started 
on a pty after INITER is finished. 

2. SEARCHING FOR THE BEGINNING OF A STORY: FILER waits for characters
to appear in the AP buffer. When some do, it converts them to ascii
and then sends a letter containing the news to all of the
jobs running the HOT line. It then checks the buffer and sees if 
it contains the beginning  or the middle of a story.
If the latter is true, it throws out the characters and waits for some more.
If the beginning of a story (marked by "A digit digit digit"LF),
is in the buffer, all the garbage before the story is thrown out and we
proceed to 3.

3. SEARCHING FOR THE END OF A STORY: Here FILER does the same character
grabbing from the buffer that it did before except that it now looks
for the three LF's signalling the end of a story, and it deposits
and sends to the HOT line all the characters until it has found the end.

4. PREPARE TO WRITE OUT THE NEWLY READ STORY: News stories are not allowed
to end on records, so if this one does, we add an extra word of null bytes
to the end of it. The INDEX file is opened and the special pointers to
the oldest story (OLD),and the place for the new story(NEW) are read in.
The length of the newly read story is added to the end of the last story
in NEWS, and if there will be some problem in fitting in 
the story (if it will overlap past OLD, or if it won't fit into
the bottom of the file without wrapping around) fixups are undertaken. If
an old story must be deleted, all the words associated with that story
are returned to the available list in LINKS and pointers are cleared in
DICT. OLD is then updated

5.WRITE OUT NEWS STORY: Now the NEWS file is opened for updating. The correct
record is read in and then written out again with the new NEWS story.
At this point a letter is sent to DOER, telling it to go to work if it isn't already.

6. Before we return to 2., we must move the last record of the last news story
up to the top of the story buffer in preparation for writing it out the next 
time.
DOER:

1. WAIT FOR MAIL: Doer is started with a letter from FILER, telling
it that there is another story to catagorize.

2. READ AN UNCATALOGED STORY FROM THE NEWS FILE: DOER first reads
in the INDEX file and grabs the special pointers NEW (pointer for next
incoming story in INDEX) and UNDUN (pointer into INDEX of first uncataloged
story). If UNDUN has caught up with NEW, DOER goes back to waiting for mail.
Otherwise, it reads in the uncatloged story from the NEWS file.

3. CHECK FOR ADDS AND TAKES, AND ALPHABETIZE WORDS IN EACH STORY: The news
is initially put in a buffer called STORY, but is soon moved, text word
by text word, into another buffer called TEXT. As the first words are being moved,
from STORY, they are also being checked for the special words "TAKE", and for
the number of another story (meaning that it is an add or correction).
If one of these occurs then the appropriate link is set in INDEX and we return
to the previous step. Otherwise, an array is built called
SORDID, which contains pointers into TEXT and links used to alphabetize the
words. When the last word is moved from STORY, all the words have been
alphabetized in SORDID. 

4. LOOK FOR KEYWORDS IN STORY: Now the DICT file is read in and the
SORDID list is compared to it for duplications. If one is found, an entry
is made into LINKS and the pointer is set in the DICT file. 

5. UPDATE INDEX FILE: The index file is read in and the UNDUN pointer is
advanced. If UNDUN ≠ NEW, DOER goes back to step 2. Otherwise it returns
to step 1.